Search CORE

194 research outputs found

A Variable Metric Probabilistic k-Nearest-Neighbours Classifier

Author: B. Larget
C. Holmes
D. Denison
J. Fan
J.P. Myles
T. Cover
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2004
Field of study

Copyright © 2004 Springer Verlag. The final publication is available at link.springer.com5th International Conference, Exeter, UK. August 25-27, 2004. ProceedingsBook title: Intelligent Data Engineering and Automated Learning – IDEAL 2004k-nearest neighbour (k-nn) model is a simple, popular classifier. Probabilistic k-nn is a more powerful variant in which the model is cast in a Bayesian framework using (reversible jump) Markov chain Monte Carlo methods to average out the uncertainy over the model parameters.The k-nn classifier depends crucially on the metric used to determine distances between data points. However, scalings between features, and indeed whether some subset of features is redundant, are seldom known a priori. Here we introduce a variable metric extension to the probabilistic k-nn classifier, which permits averaging over all rotations and scalings of the data. In addition, the method permits automatic rejection of irrelevant features. Examples are provided on synthetic data, illustrating how the method can deform feature space and select salient features, and also on real-world data

CiteSeerX

Crossref

Open Research Exeter

Cardinality constrained portfolio optimisation

Author: B. Larget
H. Markowitz
J. Campbell
J. Fieldsend
K. Deb
K. Li
L. Fisher
T.J. Chang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2004
Field of study

Copyright © 2004 Springer-Verlag Berlin Heidelberg. The final publication is available at link.springer.comBook title: Intelligent Data Engineering and Automated Learning – IDEAL 20045th International Conference on Intelligent Data Engineering and Automated Learning (IDEAL 2004), Exeter, UK. August 25-27, 2004The traditional quadratic programming approach to portfolio optimisation is difficult to implement when there are cardinality constraints. Recent approaches to resolving this have used heuristic algorithms to search for points on the cardinality constrained frontier. However, these can be computationally expensive when the practitioner does not know a priori exactly how many assets they may desire in a portfolio, or what level of return/risk they wish to be exposed to without recourse to analysing the actual trade-off frontier.This study introduces a parallel solution to this problem. By extending techniques developed in the multi-objective evolutionary optimisation domain, a set of portfolios representing estimates of all possible cardinality constrained frontiers can be found in a single search process, for a range of portfolio sizes and constraints. Empirical results are provided on emerging markets and US asset data, and compared to unconstrained frontiers found by quadratic programming

Crossref

Open Research Exeter

In search of lost introns

Author: Adachi
Aldous
Altschul
Bieri
Blum
Carmel
Collins
Coulombe-Huntington
Csűrös
Csűrös
Devroye
Durbin
Edgar
Felsenstein
Felsenstein
Felsenstein
Friedman
Guindon
Harding
Heard
Hubbard
Igor B. Rogozin
IHBSC
J. Andrew Holey
Jeffares
Kececioglu
Kosakovsky Pond
Larget
Ma
Marchler-Bauer
McDiarmid
McKenzie
Miklós Csűrös
Müller
Nguyen
Nielsen
Nixon
Press
Pruitt
Raible
Rogozin
Rogozin
Rosenberg
Roy
Roy
Roy
Roy
Stamatakis
Steel
Sverdlov
Sverdlov
Tatusov
Vaňácová
Zhang
Publication venue
Publication date: 03/02/2007
Field of study

Many fundamental questions concerning the emergence and subsequent evolution of eukaryotic exon-intron organization are still unsettled. Genome-scale comparative studies, which can shed light on crucial aspects of eukaryotic evolution, require adequate computational tools. We describe novel computational methods for studying spliceosomal intron evolution. Our goal is to give a reliable characterization of the dynamics of intron evolution. Our algorithmic innovations address the identification of orthologous introns, and the likelihood-based analysis of intron data. We discuss a compression method for the evaluation of the likelihood function, which is noteworthy for phylogenetic likelihood problems in general. We prove that after

O(nL)

preprocessing time, subsequent evaluations take

O(nL/\log L)

time almost surely in the Yule-Harding random model of

n

-taxon phylogenies, where

L

is the input sequence length. We illustrate the practicality of our methods by compiling and analyzing a data set involving 18 eukaryotes, more than in any other study to date. The study yields the surprising result that ancestral eukaryotes were fairly intron-rich. For example, the bilaterian ancestor is estimated to have had more than 90% as many introns as vertebrates do now

arXiv.org e-Print Archive

Crossref

College of Saint Benedict and Saint John’s University: DigitalCommons@CSB/SJU

Observability and nonlinear filtering

Author: B. Larget
C. Dellacherie
D. Ocone
D. Revuz
D. Williams
F. Kochman
G. Basile
G. Nicolao De
H. Ito
H. Kunita
J.B. Conway
J.M.C. Clark
L. Gurvits
M. Rao
M.S. Bartlett
O. Kallenberg
P. Baxendale
P. Billingsley
P. Chigansky
P. Park
R. Atar
R. Triggiani
R.S. Bucy
R.S. Liptser
R.Z. Has’minskiĭ
Ramon van Handel
S.N. Ethier
T. Kaijser
W. Rudin
X.M. Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 17/06/2008
Field of study

This paper develops a connection between the asymptotic stability of nonlinear filters and a notion of observability. We consider a general class of hidden Markov models in continuous time with compact signal state space, and call such a model observable if no two initial measures of the signal process give rise to the same law of the observation process. We demonstrate that observability implies stability of the filter, i.e., the filtered estimates become insensitive to the initial measure at large times. For the special case where the signal is a finite-state Markov process and the observations are of the white noise type, a complete (necessary and sufficient) characterization of filter stability is obtained in terms of a slightly weaker detectability condition. In addition to observability, the role of controllability in filter stability is explored. Finally, the results are partially extended to non-compact signal state spaces

arXiv.org e-Print Archive

Crossref

Caltech Authors

Dependence of paracentric inversion rate on tract length

Author: A Brehm
AH Sturtevant
B Larget
GA Watterson
I Miklos
J Kececioglu
K Yogeeswaran
M Caceres
R Durrett
R Pinter
Rasmus Nielsen
Rick Durrett
S Hannenhalli
Thomas L York
TL York
V Bafna
WJ Kent
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

BACKGROUND: We develop a Bayesian method based on MCMC for estimating the relative rates of pericentric and paracentric inversions from marker data from two species. The method also allows estimation of the distribution of inversion tract lengths. RESULTS: We apply the method to data from Drosophila melanogaster and D. yakuba. We find that pericentric inversions occur at a much lower rate compared to paracentric inversions. The average paracentric inversion tract length is approx. 4.8 Mb with small inversions being more frequent than large inversions. If the two breakpoints defining a paracentric inversion tract are uniformly and independently distributed over chromosome arms there will be more short tract-length inversions than long; we find an even greater preponderance of short tract lengths than this would predict. Thus there appears to be a correlation between the positions of breakpoints which favors shorter tract lengths. CONCLUSION: The method developed in this paper provides the first statistical estimator for estimating the distribution of inversion tract lengths from marker data. Application of this method for a number of data sets may help elucidate the relationship between the length of an inversion and the chance that it will get accepted

CiteSeerX

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Copenhagen University Research Information System

eScholarship - University of California

Global distribution of two fungal pathogens threatening endangered sea turtles

Author: A Gargas
A Peters
A Rambaut
AD Phillott
AD Phillott
AD Phillott
AD Phillott
AD Phillott
Adolfo Marco
AG Kluge
Andrea D. Phillott
B Hube
B Larget
BP Wallace
CE Blanck
DPG Short
DPG Short
E Abella-Pérez
Elena Abella-Pérez
F Lutzoni
F Rodríguez
F Ronquist
Güçlü Ozr
H Ray
HI Nirenberg
J Davenport
J Diéguez-Uribeondo
J Diéguez-Uribeondo
J Felsenstein
J Wyneken
Javier Diéguez-Uribeondo
JE Adaskaveg
JJ Coleman
JM Sarmiento-Ramírez
Jolene Sim
JP Huelsenbeck
JS Farris
Jullie M. Sarmiento-Ramírez
K Kim
K O'Donnell
K O'Donnell
K Söderhäll
Kenneth Söderhäll
KL Eckert
María P. Martín
MC Fisher
MJ Fernández-Benéitez
MW Vandersea
N Kitancharoen
N Mrosovsky
N Mrosovsky
N Zhang
Pieter van West
R Vilgalys
RA Acuña-Mesén
RC Summerbell
RDM Page
SA Cameron
SA Rehner
SN Stuart
T Aoki
T Aoki
YJ Liu
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 21/01/2014
Field of study

This work was supported by grants of Ministerio de Ciencia e Innovación, Spain (CGL2009-10032, CGL2012-32934). J.M.S.R was supported by PhD fellowship of the CSIC (JAEPre 0901804). The Natural Environment Research Council and the Biotechnology and Biological Sciences Research Council supported P.V.W. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Thanks Machalilla National Park in Ecuador, Pacuare Nature Reserve in Costa Rica, Foundations Natura 2000 in Cape Verde and Equilibrio Azul in Ecuador, Dr. Jesus Muñoz, Dr. Ian Bell, Dr. Juan Patiño for help and technical support during samplingPeer reviewedPublisher PD

Aberdeen University Research Archive

ResearchOnline@JCU

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

ResearchOnline at James Cook University

PubMed Central

Digital.CSIC

FigShare

A framework for orthology assignment from gene rearrangement data

Author: A. Caprara
B. Larget
B.M.E. Moret
C. Thach Nguyen
D. Bryant
D. Sankoff
D. Sankoff
D. Sankoff
D. Sankoff
D.A. Bader
G. Tesler
J. Earnest-DeYoung
J. Tang
J.L. Boore
J.L. Boore
K.M. Swenson
M. Blanchette
M. Marron
M.E. Cosner
N. El-Mabrouk
N. El-Mabrouk
S. Hannenhalli
S.R. Downie
X. Chen
Publication venue: Springer
Publication date: 01/01/2005
Field of study

Abstract. Gene rearrangements have successfully been used in phylogenetic reconstruction and comparative genomics, but usually under the assumption that all genomes have the same gene content and that no gene is duplicated. While these assumptions allow one to work with organellar genomes, they are too restrictive when comparing nuclear genomes. The main challenge is how to deal with gene families, specifically, how to identify orthologs. While searching for orthologies is a common task in computational biology, it is usually done using sequence data. We approach that problem using gene rearrangement data, provide an optimization framework in which to phrase the problem, and present some preliminary theoretical results.

CiteSeerX

Crossref

Phylogenetic analysis of Croatian orf viruses isolated from sheep and goats

Author: A De la Concha-Bermejillo
AA Mercer
Ana Beck
B Larget
B Mondal
Branko Sostaric
C Mazur
CW Livingston
DJ McKeever
F Rheinbaben
FA Murphy
G Delhon
Ivana Lojkic
J Felsenstein
JP Huelsenbeck
JS Abrahao
JT Sullivan
K Tamura
K Zhang
K Zhao
M Hosamani
M Mahmoud
MK Tikkanen
PF Nettleton
RC Gumbrell
S Cvetnic
SB Fleming
T Vikoren
TH Jukes
Tomislav Bedekovic
Y Inoshima
Zeljko Cac
Zeljko Cvetnic
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background The <it>Orf virus </it>(ORFV) is the prototype of the parapoxvirus genus and it primarily causes contagious ecthyma in goats, sheep, and other ruminants worldwide. In this paper, we described the sequence and phylogenetic analysis of the B2L gene of ORFV from two natural outbreaks: i) in autochthonous Croatian Cres-breed sheep and ii) on small family goat farm. Results Sequence and phylogenetic analyses of the ORFV B2L gene showed that the Cro-Cres-12446/09 and Cro-Goat-11727/10 were not clustered together. Cro-Cres-12446/09 shared the highest similarity with ORFV NZ2 from New Zealand, and Ena from Japan; Cro-Goat-11727/10 was closest to the HuB from China and Taiping and Hoping from Taiwan. Conclusion Distinct ORFV strains are circulating in Croatia. Although ORFV infections are found ubiquitously wherever sheep and goats are farmed in Croatia, this is the first information on genetic relatedness of any Croatian ORFV with other isolates around the world.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Sampling solution traces for the problem of sorting permutations by signed reversals

Author: A Bergeron
A Bergeron
A Bergeron
AC Siepel
AE Darling
B Larget
B Larget
Christian Baudet
D Sankoff
DA Bader
E Tannier
G Badr
I Miklós
I Miklós
I Miklós
JF Lefebvre
KM Swenson
KM Swenson
KM Swenson
M Cáceres
Marie-France Sagot
MDV Braga
MDV Braga
MDV Braga
MDV Braga
MDV Braga
P Cartier
R Durrett
S Hannenhalli
S Hannenhalli
S Yancopoulos
TL York
Zanoni Dias
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

International audienceBackgroundTraditional algorithms to solve the problem of sorting by signed reversals output just one optimal solution while the space of all optimal solutions can be huge. A so-called trace represents a group of solutions which share the same set of reversals that must be applied to sort the original permutation following a partial ordering. By using traces, we therefore can represent the set of optimal solutions in a more compact way. Algorithms for enumerating the complete set of traces of solutions were developed. However, due to their exponential complexity, their practical use is limited to small permutations. A partial enumeration of traces is a sampling of the complete set of traces and can be an alternative for the study of distinct evolutionary scenarios of big permutations. Ideally, the sampling should be done uniformly from the space of all optimal solutions. This is however conjectured to be ♯P-complete.ResultsWe propose and evaluate three algorithms for producing a sampling of the complete set of traces that instead can be shown in practice to preserve some of the characteristics of the space of all solutions. The first algorithm (RA) performs the construction of traces through a random selection of reversals on the list of optimal 1-sequences. The second algorithm (DFALT) consists in a slight modification of an algorithm that performs the complete enumeration of traces. Finally, the third algorithm (SWA) is based on a sliding window strategy to improve the enumeration of traces. All proposed algorithms were able to enumerate traces for permutations with up to 200 elements.ConclusionsWe analysed the distribution of the enumerated traces with respect to their height and average reversal length. Various works indicate that the reversal length can be an important aspect in genome rearrangements. The algorithms RA and SWA show a tendency to lose traces with high average reversal length. Such traces are however rare, and qualitatively our results show that, for testable-sized permutations, the algorithms DFALT and SWA produce distributions which approximate the reversal length distributions observed with a complete enumeration of the set of traces

CiteSeerX

Crossref

Springer - Publisher Connector

INRIA a CCSD electronic archive server

PubMed Central

Hal-Diderot

Probabilistic Phylogenetic Inference with Insertions and Deletions

Author: A Pang
A Siepel
A Siepel
A Stamatakis
AD Smith
B Boussau
B Knudsen
B Knudsen
B Knudsen
B Larget
B Mau
B Mau
B Qian
B Qian
B Qian
B Rannala
C Kosiol
C Moler
D Metzler
D Simon
David Haussler
DF Robinson
DG Hwang
DL Swofford
E Rivas
Elena Rivas
F Ronquist
G Lunter
G Lunter
G Lunter
G McGuire
GA Churchill
GJ Mitchison
GJ Mitchison
I Holmes
I Holmes
I Holmes
I Miklós
I Miklós
J Adachi
J Felsenstein
J Felsenstein
J Felsenstein
J Felsenstein
J Hein
J Hein
J Hein
J Kim
J Stoye
J Wang
JD McAuliffe
JJ Cannone
JL Thorne
JL Thorne
JL Thorne
JP Huelsenbeck
JS Pedersen
L Chindelevitch
L Coin
M Blanchette
M Dayhoff
M Gribskov
M Hasegawa
M Kimura
M Steel
MJ Bishop
MK Kuhner
MS Chang
N Goldman
P Liò
PD Keightley
R Durbin
R Fleissner
S Guindon
S Karlin
S Tavaré
S Whelan
Sean R. Eddy
SV Muse
TH Jukes
W Cai
Z Yang
Z Yang
Z Yang
Z Yang
Z Yang
Z Yang
Publication venue: Public Library of Science
Publication date: 01/01/2008
Field of study

A fundamental task in sequence analysis is to calculate the probability of a multiple alignment given a phylogenetic tree relating the sequences and an evolutionary model describing how sequences change over time. However, the most widely used phylogenetic models only account for residue substitution events. We describe a probabilistic model of a multiple sequence alignment that accounts for insertion and deletion events in addition to substitutions, given a phylogenetic tree, using a rate matrix augmented by the gap character. Starting from a continuous Markov process, we construct a non-reversible generative (birth–death) evolutionary model for insertions and deletions. The model assumes that insertion and deletion events occur one residue at a time. We apply this model to phylogenetic tree inference by extending the program dnaml in phylip. Using standard benchmarking methods on simulated data and a new “concordance test” benchmark on real ribosomal RNA alignments, we show that the extended program dnamlε improves accuracy relative to the usual approach of ignoring gaps, while retaining the computational efficiency of the Felsenstein peeling algorithm

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central